The Anthropic Messages API create function accepts parameters to control model selection, output generation, system behavior, performance, and advanced features like tool use and extended thinking.
The Anthropic Messages API create function is designed for flexible conversational AI. Its parameters can be grouped into several logical categories: those that define the conversation flow, control output generation, manage performance and cost, and enable advanced features such as tool use and extended reasoning.
messages: The core of the request. This is an array of message objects, each with a role (user or assistant) and content. It represents the conversation history and the latest user input. It may also include optional cache_control markers for prompt caching.
model: A required parameter specifying which Claude model to use (e.g., claude-3-5-sonnet-20241022). This determines the model's capabilities, context window, and pricing.
system: Defines the system prompt, setting the assistant's behavior, persona, and instructions for the entire conversation. It can be provided as a simple string or an array of TextBlockParam objects for more complex system messages.
max_tokens: A required parameter setting the maximum number of tokens that Claude can generate in its response. This is a hard limit and prevents excessively long outputs.
stop_sequences: An optional array of custom strings that, when generated, will cause the model to stop producing further tokens. For example, you could set stop_sequences: ["</answer>"] to end generation after a specific XML tag is generated, or ["\n\nHuman:"] to stop before an expected user turn.
temperature: An optional parameter (range 0-1) controlling the randomness of the output. Higher values (e.g., 0.8) make the output more creative and diverse, while lower values (e.g., 0.2) make it more deterministic and focused.
top_p: An optional parameter for nucleus sampling. It dynamically selects the smallest set of tokens whose cumulative probability exceeds top_p (e.g., 0.9). This is an alternative to temperature for controlling randomness and should not be used simultaneously with it.
top_k: An optional parameter that limits the model to sampling from only the top k most likely tokens at each step (e.g., top_k: 40). It is a more traditional sampling method and is also deprecated in favor of temperature or top_p in modern model versions.
output_config: An optional parameter for configuring output-specific options. For example, output_config: { "output_format": "json" } can be used to request structured JSON output from the model. (Refer to Anthropic's latest docs for specific options).
tools: An array of tool definitions that the model can use. Each tool definition includes a name, a description, and an input_schema (JSON schema) defining the tool's expected parameters. This enables function calling and tool use.
tool_choice: Specifies how the model should use the provided tools. Options include "auto" (model decides), "any" (must use a tool), {"type": "tool", "name": "tool_name"} (force a specific tool), or {"type": "none"} to disable tool use.
thinking: Configuration for enabling Claude's extended thinking capabilities. This allows the model to reason step-by-step for complex tasks, potentially improving accuracy on challenging problems.
cache_control: This can be applied to specific content blocks within messages or system or as a top-level parameter. It marks cacheable content, allowing Anthropic to reuse cached data across API requests, reducing latency and costs.
metadata: An object for attaching metadata to the request, such as a user ID (user_id) for tracking usage and costs, or other custom identifiers.
service_tier: An optional parameter that determines whether to use priority (below-capacity) capacity ("priority") or standard capacity ("standard") for the request. Priority capacity offers faster processing at a higher cost but is not guaranteed to be available at all times.
inference_geo: An optional string to specify a geographic region for inference processing (e.g., "eu"). This can help with latency and compliance requirements. If omitted, the workspace's default is used.
container: An optional container identifier for reusing cached model states across multiple requests, which can be useful for optimizing repetitive tasks.
stream: A boolean parameter. Setting this to true enables streaming, where the API returns a stream of Server-Sent Events (SSE) that sends back chunks of the response as they're generated, improving perceived latency.
container, inference_geo, service_tier: These are typically used in larger enterprise or production deployments. container allows reuse of model state across requests, inference_geo selects the geographic region for processing, and service_tier determines capacity priority.
Deprecated Parameters: top_k is deprecated in favor of temperature or top_p, as they are more effective or provide better controls.